Learning Pronunciation Rules for English Graphemes Using the Version Space Algorithm

نویسندگان

  • Howard J. Hamilton
  • Jian Zhang
چکیده

We describe a technique for learning pronunciation rules based on the Version Space algorithm. In particular, we describe how to learn pronunciation rules for a representative subset of the English graphemes. We present a learning procedure called LEP-G.1 (learning to pronounce English graphemes) that learns English pronunciation rules from examples in the form of word-pronunciation pairs. With our approach, we can translate not only English words in dictionaries, but also new words such as tuple, pixel, and deque which are not found in dictionaries. An experiment where LEP-G.1 learned pronunciation rules for 12 graphemes strongly suggests that learning the other possible 52 graphemes in English is feasible.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inferring Hierarchical Pronunciation Rules from a Phonetic Dictionary

This work presents a new phonetic transcription system based on a tree of hierarchical pronunciation rules expressed as context-specific grapheme-phoneme correspondences. The tree is automatically inferred from a phonetic dictionary by incrementally analyzing deeper context levels, eventually representing a minimum set of exhaustive rules that pronounce without errors all the words in the train...

متن کامل

The Iterated Version Space Algorithm

We present the Iterated Version Space Al gorithm IVSA which retains many advan tages of the Version Space Algorithm while handling disjunctive concepts and noise IVSA repeatedly generates hypotheses for regions of the concept space and combines these regional hypotheses to produce an overall concept hypothesis Experiments were conducted to learn English pronunci ation rules from word pronunciat...

متن کامل

Exploring Grapheme-to-Phoneme Induction with Machine Learning

Text-to-speech (TTS) systems have increasingly found use in the modern world. One of the subproblems of TTS is determining the phonetic structure of words, i.e., their pronunciation, from their orthography, i.e., their spelling. This is known as the grapheme-to-phoneme (G2P) problem. In all languages this is a nontrivial task, but particularly in English, a language with rich historiolinguistic...

متن کامل

Using a hybrid approach to build a pronunciation dictionary for Brazilian Portuguese

This paper describes the method employed to build a machinereadable pronunciation dictionary for Brazilian Portuguese. The dictionary makes use of a hybrid approach for converting graphemes into phonemes, based on both manual transcription rules and machine learning algorithms. It makes use of a word list compiled from the Portuguese Wikipedia dump. Wikipedia articles were transformed into plai...

متن کامل

Learning English Grapheme Segmentation Using the Iterated Version Space Algorithm

Our unique approach for learning English grapheme segmen tation LE GS rules using the Iterated Version Space Algorithm IVSA is presented After de ning the problem and our representation for the instances and hypotheses we illustrate the LE GS approach by trac ing a speci c example Experimental results based on a ten fold testing methodology are given to show the performance of the LE GS learnin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994